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Abstract. Here we present a new fixed parameter tractable algorithm to compute the 
hybridization number r of two rooted, not necessarily binary phylogenetic trees on taxon set 
X in time (6 r r!) -poly(n), where n = \X\. The novelty of this approach is its use of terminals, 
which are maximal elements of a natural partial order on X, and several insights from the 
softwired clusters literature. This yields a surprisingly simple and practical bounded-search 
algorithm and offers an alternative perspective on the underlying combinatorial structure 
of the hybridization number problem. 

1 Introduction 

The rooted phylogenetic tree (henceforth, tree) is the traditional model for modelling the evolution 
of a set of species (or, more generally, taxa) X (see e.g. |9llOI23| b A rooted phylogenetic network 
(henceforth, network) is a generalisation from trees to directed acyclic graphs which allows retic- 
ulate evolutionary phenomena such as hybridization, recombination and horizontal gene transfer 
to be incorporated (see Figure [T]). For detailed background information on networks we refer the 
reader to [1211311412^121122] . 
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Fig. 1. An example of a (binary) rooted phylogenetic network on X = {a, . . . , I}. This network 
has five reticulation nodes. 

One use of networks, motivated in particular by the need to merge a set of discordant gene 
trees into a species network [2TJ, is the following. Given a set of trees T, where each tree T e T 
has the same set of taxa X, construct a "most parsimonious" network which displays all the 
trees in T . If we define "most parsimonious" to mean: has as few reticulation nodes (i.e. nodes 
with indegree two or higher) as possible, we obtain the hybridization number problem 2 3 . There 
has been extensive research into perhaps the simplest possible variant of this problem; this is 
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when T contains two binary (i.e. fully resolved) trees. Unfortunately, even this stylized version 
of the problem is computationally difficult; it is NP-hard and in a theoretical sense difficult to 
approximate well 5 18J. On the other hand, there has been considerable progress in developing fixed 
parameter tractable (FPT) algorithms for the problem. Essentially, these are algorithms which can 
determine whether the hybridization number of two trees is at most r in time f(r) -poly(n), where 
n = \X\ and f(r) is a function that does not depend on n (see [H] for an introduction to fixed 
parameter tractability) . The idea of such algorithms is that, by decoupling n and r, the running 
time of the algorithm tends to grow more slowly than algorithms with a running time of the form 
(3( n /0)) The first such algorithms were described in [4)7] and the current theoretical state-of-the 
art is an algorithm with running time (3.18 1 *) ■ poly(n) [27]. There are also a number of very fast 
software packages in existence that are wholly or partially based on insights from fixed parameter 
tractability [116] . 

However, what if T contains more than two trees and/or contains trees that are not fully 
resolved? Algorithms to compute the hybridization number of such T are necessary, because this 
more accurately reflects the type of trees that emerge in applied phylogenetics [20] . In this article 
we are interested in the situation when T contains two not necessarily fully resolved trees on X . 
(We henceforth refer to such trees as nonbiliary, noting that this classification includes binary 
trees as a special case). Given that this problem is a generalisation of the binary case, it inherits 
all the negative results from that case, but not necessarily the positive results. Indeed, there are 
far fewer positive results for nonbinary. A number of non-trivial technicalities arise because in the 
nonbinary case we only require that the network displays some refinement of each tree i.e. the 
image of the tree contained in the network can be more resolved than the original tree [TO] . This is 
a natural and desirable definition given that biologists often use nodes with outdegree 3 or higher 
in trees to denote uncertainty, rather than a hard topological constraint. 

Recently there have been two non-FPT algorithms implemented (both of which are available 
in the package Dendroscope [15]) to solve the nonbinary problem in polynomial time when 
the hybridization number is bounded |11126j . The nonbinary problem is, furthermore, FPT. This 
was established in [19] using kernelization. Unfortunately, mainly due to the very idiosyncratic 
behaviour of common chains in the nonbinary case, the analysis given in [TO] is rather long and 
complex, and the (weighted) kernel they describe is also rather large, containing at most (89r) 
taxa; the size of the unweighted kernel is quadratic in r. As far as we are aware the algorithm in 
[TO] has not been implemented. 

In this article we present an alternative FPT algorithm for nonbinary that is based on bounded- 
search rather than kernelization, with running time (6 r r!) • poly{n). The resulting algorithm is 
extremely simple and amenable to implementation (it manages to completely avoid the concept 
of chains) and the analysis of correctness is comparatively straightforward. The algorithm builds 
heavily on a number of basic results from the softwired cluster literature |12|13j . in particular |17j . 
This literature concerns a slightly different methodology for constructing phylogenetic networks, 
but as observed in [25117] the optima of the models synchronise in the case of two input trees, 
allowing results and concepts from one methodology to be used in the other. 

The simplicity of our new algorithm stems from a careful examination of a natural partial order 
(and its maximal elements, which we call terminals) on X ', which turns out to be closely linked 
to hybridization number. This partial order appeared earlier in [16] and [17] but was used in a 
slightly different way. Via the observations in Q~7] the earlier (and more general) results in [TO] also 
imply an FPT algorithm via softwired clusters for the nonbinary case, but with an astronomical 
running time. The added value of the present article is that, by making heavy use of the fact that 
there only two trees in the input, we are able to obtain a significantly simplified and optimized 
algorithm that can actually be used in practice. 

For completeness we have implemented a prototype version of the algorithm, available upon 
request. However, perhaps the best use of the algorithm is to integrate it into existing, well- 
supported non-FPT algorithms for the nonbinary problem (such as the 2012 release of Cass 
[24)26] ) to bound their search space and to thus upgrade their status to FPT. 



2 Preliminaries 



2.1 Trees, networks and clusters 
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Fig. 2. In (b) we see that N displays the tree in (a), and in (c) we see that N displays a binary 
refinement of the tree in (d). The dotted edges denote the reticulation edges that should be deleted 
to obtain the required tree. 

Consider a set X of taxa. A rooted phylogenetic network (on X), henceforth network, is a 
directed acyclic graph with a single node with indegree zero (the root), no nodes with both in- 
degree and outdegree equal to 1, and leaves bijectively labeled by X. The indegree of a node v 
is denoted S~(v) and v is called a reticulation if S~(v) > 2, otherwise it is called a tree node. 
An edge (u, v) is called a reticulation edge if its target node v is a reticulation. When count- 
ing reticulations in a network, we count reticulations with more than two incoming edges more 
than once because, biologically, these reticulations represent several reticulate evolutionary events. 
Therefore, we formally define the reticulation number of a network N — (V, E) as 

r(N)= Yl (S-(v)-l) = \E\-\V\ + l . 

veV:8~ (v)>0 

A rooted phylogenetic tree on X, henceforth tree, is simply a network that has reticulation 
number zero. We say that a network TV on A" displays a tree T if T can be obtained from N by 
performing a series of node and edge deletions and eventually by suppressing nodes with both 
indegree and outdegree equal to 1 (see Figure [2]). We assume without loss of generality that each 
reticulation has outdegree at least one. Consequently, each leaf has indegree one. We say that a 
network is binary if every reticulation node has indegree 2 and outdegree 1 and every tree node 
that is not a leaf has outdegree 2. 

Proper subsets of X are called clusters, and a cluster C is a singleton if |C| = 1. We say that 
an edge (u, v) of a tree represents a cluster C C X if C is the set of taxa descendants of v. A 
tree T represents a cluster C if it contains an edge that represents C. For example, the tree in 
Figure [2j a) represents {c,d,e} but not {d, e, /}. We say that N represents C "in the softwired 
sense" if N displays some tree T on X such that T represents C. In this article we only consider 
the softwired notion of cluster representation and henceforth assume this implicitly. A network 
represents a set of clusters C if it represents every cluster in C (and possibly more). For a set C 
of clusters on X we define r(C) as min{r(A^)|iV represents C}, we refer to this as the reticulation 
number of C. We say that two clusters Ci, C2 C X are compatible if either C\ n C2 = or C\ C C2 
or Ci C C\, and incompatible otherwise. A set of clusters C is compatible if all clusters in C are 
mutually compatible. 



2.2 The equivalence of (maximal) common pendant subtrees and (maximal) 
ST-sets 

Let T be a tree on A. We write Cl(T) to denote the set of clusters represented by edges of T, and 
for a set of trees T on A we write Cl(T) — Ut£tCI(T)- We say that a (binary) tree T' on X is a 
(binary) refinement of T if C7(T) C CZ(T') (see Figure |2j. We say two trees T x and T 2 on X have 
a common refinement if there exists a tree T 1 on A" such that Ci(Ti) U Cl(T 2 ) C Cl(T'), where 
the last condition is equivalent to saying that the set of clusters Cl(Ti) U Cl(T 2 ) is compatible. 
We say that a tree T* on X* C A" is a pendant subtree of T if there is a refinement T' of T 
such that X* e Cl(T'). Note that this definition does not depend on the topology of T* so we 
can equivalcntly say that X* is a pendant subtree of T. A pendant subtree X* is non-trivial if 
I A"* I > 1. Given two trees T\,T 2 on A we say that A* C A is a common pendant subtree if A* is 
a pendant subtree of both T\ and T 2 and Ti|A* and T 2 |A* have a common refinement. (As usual 
T\X' for A' C A refers to the tree obtained by suppressing nodes with indegree and outdegree 
equal to 1 in the minimal subtree of T that connects all elements of A'). Note that our definition 
of common pendant subtree is consistent with |19) . which we follow. 

Given a set S C A of taxa, we use C \ S to denote the result of removing all elements of S from 
each cluster in C and we use CIS 1 to denote C\(X\ S) (the restriction of C to S). Following [TT] . 
we say that a set S C A is an ST-set with respect to C, if 5 is compatible with all clusters in C 
and any two clusters Ci, C2 € C\S are compatible. An ST-set S is maximal if there is no ST-set T 
with S C T. The maximal ST-sets are unique, partition A and can be computed in polynomial 
time [17]. 
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Fig. 3. The two trees shown in (a) and (b) have maximal ST-sets (i.e. maximal common pendant 
subtrees) {a}, {6, c}, {d, e, /}, {g}, shown in colour. In (c) and (d) we show the result of collapsing 
the maximal ST-sets in (a) and (b) (respectively) into single taxa. 

In [13 Lemma 6] it is proven that, if C = Cl{Tx) U Cl(T 2 ), and A* is an ST-set of C, then 
for each i € {1, 2}, there exists a node Vi of Tj such that A* is exactly equal to the union of 
the clusters represented by some (not necessarily strict) subset of the edges outgoing from Uj. 
From this it follows that A* is a (maximal) ST-set of Cl{T x ) U Cl{T 2 ) if and only if A* is a 
(maximal) common pendant subtree of T\ and T 2 ■ We will make heavy use of this equivalence and 
use the concepts interchangeably. In particular, all the maximal ST-sets of C = Cl(Ti)UCl(T 2 ) are 
singletons (in which case we say C is ST- collapsed |17j ) if and only if T± and T 2 have no non-trivial 
common pendant subtrees. A related operation is to create an ST-collapsed set of clusters by 
collapsing all maximal ST-sets into single taxa as shown in Figure [3j Collapsing maximal ST-sets 
does not change the reticulation number of the set of clusters (because there always exists an 
optimal network in which the maximal ST-sets are "pendant" [TTJ Corollary 11]). 



2.3 The special case of (clusters obtained from) two trees 

Given two trees T = {Ti, T2} on X , we (again following |19j ) define h(T) (the hybridization number 
of T) as the smallest value of r(N) ranging over all networks N on X such that N displays a binary 
refinement of T\ and a binary refinement of T%. In [17j Observation 9] we note that the emphasis 
on binary refinements does not sacrifice generality. Furthermore, from |25[ Lemma 2] we may 
assume without loss of generality that in the definition of h(T), N can be restricted to being 
binary. Observe that, if T is an arbitrary set of trees on X, r(Cl(T)) < h(T). This holds because 
if a network displays a (refinement of a) tree T then it certainly also represents all the clusters in 
Cl(T). For \T\ > 2 this inequality can be strict [3S]. However, in [TTJ Lemma 12] it is proven that 
if T = {Ti, T 2 } are two trees on X, and C = Cl(T), then r(C) — h{T). Unfortunately, even in this 
special case, if N represents all clusters in C it does not necessarily display (binary refinements 
of) T\ and T 2 |25j . Fortunately a polynomial-time, reticulation-number preserving transformation 
is possible, which we describe later in Section [3] 

3 The structure of optimal solutions 

We begin with some simple results which formalize the idea that, when T contains exactly two 
trees, the problem has "optimal substructure" i.e. optimal solutions can be constructed from 
arbitrary optimal solutions for well-chosen subproblems. We begin with a focus on clusters, but 
then explicitly link this to trees in Lemma [2] and Corollary [T] 

Observation 1. Let C = Cl(T) be a set of clusters on X, where T = {Ti,Ta} is a set of two 
trees on X with no non-trivial common pendant subtrees, and r(C) > 1. Then there exists x G X 
such that r(C \ {x}) < r(C). 

Proof. Consider without loss of generality a binary network N which represents C, where r(N) = 
r(C). By acyclicity N contains at least one Subtree Below a Reticulation (SBR) |17j . i.e. a node 
u with indegree-1 whose parent is a reticulation, and such that no reticulation can be reached by 
a directed path from u. Let X' be the set of taxa reachable from u by directed paths. X' is an 
ST-set, so \X'\ = 1 (because C is ST-collapsed). Let x be the single taxon in X'. Deleting x and 
its reticulation parent from TV (and tidying up the resulting network in the usual fashiorQ creates 
a network N' on X \ {x} with r(N') < r(N) that represents C \ {x}. □ 

Lemma 1. Let C = Cl(T) be a set of clusters on X , where T — {Ti, T2} is a set of two trees on 
X with no non-trivial common pendant subtrees, and r(C) > 1. Then for each x G X it holds that 
r(C)-l <r(C\{x}) <r(C). 

Proof. The second < is immediate because removing a taxon from a cluster set cannot raise the 
reticulation number of the cluster set. The first < holds because in (TTJ Lemma 10] it is shown 
how, given any network TV' on X \ {x} that represents C \ {x}, we can extend N' to obtain a 
network N on X that represents C such that r(N) < r(N') + 1. □ 

We recall the following definition from [17]. For a set of clusters C on X, we call (Si, 52, ...,5 P ) 
(p > 0) an ST-set tree sequence of length p if Si is a ST-set of C, S2 is a ST-set of C \ Si, S3 is a 
ST-set of C \ Si \ 52 (and so on) and if all the clusters in C \ Si \ . . . \ S p are mutually compatible 
i.e. can be represented by a tree. If C = Cliff) where T = {Xi,T2} are two trees on X, then 
r(C) is exactly equal to the minimum length of an ST-set tree sequence for C [171 Corollary 9]. 
Essentially, the ST-set tree sequence describes an order in which common pendant subtrees can 
be iteratively pruned from Ti and T 2 to obtain a common tree T. As an example, the two trees 
in Figure [3][ a) and (b) have a minimum-length ST-set tree sequence ({6, c}, {d, e, /}), and the 

1 Specifically, for as long as necessary applying the following tidying-up operations until they are no longer 
needed: deleting any node with outdegree zero that is not labelled by an element of X; suppressing all 
nodes with indegree and outdegree both equal to 1; replacing multi-edges with single edges; deleting 
nodes with indegree-0 and outdegree- 1 |17j . 



hybridization number of these two trees is indeed 2. 



Observation [T] and Lemma [T] show that, in an ST-collapscd cluster set, there always exists at 
least one taxon x such that r(C \ {x}) = r(C) — 1, and that this is the best possible decrease in 
reticulation number. If we somehow locate such an x (it does not matter which one), construct 
C \ {x}, compute its maximal ST-sets, collapse them, and then repeat this until we obtain a 
compatible set of clusters, we are actually constructing a minimum-length ST-set tree sequence 
(Si, . . . , SV(c)) of C. (Note that the actual Si can easily be obtained by reversing any collapsing 
operations). Such a sequence not only tells us r(C), it also instructs us how to construct in polyno- 
mial time a network N which represents all the clusters in C such that r(C) = r(N) 17, Theorem 
3]. Less obviously, it also tells us how to construct a network TV with r(N) = r(C) = h(T) which 
displays the two trees that C came from: 

Lemma 2. Let C = Cl(T) be a set of clusters on X , where T = {T 1; T 2 } *s a set of two trees on 
X . Let (Si, . . . , S p ) be an ST-set tree sequence of C. Then in polynomial time we can construct a 
network N that displays binary refinements ofT\ and T 2 such that r(N) = p. 

Proof. (Figure [4] shows a slightly stylized example of the following) . Let Xq = X and let Xi = 
Xi-i \ Si, for 1 < i < p. Define Cj = C\Xi, for < i < p. By assumption, the clusters in C p 
can be represented by a tree. This is equivalent to saying that T\\X p and Ti\X v have a common 
refinement. We construct in polynomial time an arbitrary binary tree T on X p that displays these 
clusters; T will also be a common binary refinement of Ti\X p and T 2 \X p . Let T — N„. We now 
show how to construct a network A^;_i that displays binary refinements of T\\Xi_\ and T 2 \Xi^i, 
given an arbitrary network Ni that displays binary refinements of T\\Xi and T 2 \Xi, for 1 < i < p. 
By definition, Si is an ST-set of C,_i. Si thus corresponds to a common pendant subtree of T\ |Afj_i 
and T 2 \Xi^i, and indeed T\\Xi and T 2 \Xi are exactly the trees obtained by pruning Si from T\\Xi-\ 
and T 2 \Xi_i. So, reversing this pruning means that T\\Xi_\ and T 2 |A' i _ 1 can be obtained from 
Ti\Xi and T 2 \Xi (respectively) by re-grafting S, at a particular vertex or edge. Specifically, let T* 
be an arbitrary binary tree that represents Ci-\\Si, this will also be a common binary refinement 
of the common pendant subtree Si. Now, Ni^i can be obtained from Ni by extending the images 
of Ti\Xi and T 2 \Xi inside N as follows: we introduce T* below a new reticulation and attach 
this reticulation at (or, if necessary, slightly above) the two aforementioned re-grafting points. 
There are some small technicalities (such as the need for a "dummy root" |17j ) but we omit these 
details. □ 




Fig. 4. A demonstration of the construction described in Lemma [2] The trees in Figure |3|a) 
and (b) have a minimum-length ST-set tree sequence ({6, c}, {d, e, /}) and here we show how to 
construct a network N with r(N) = 2 that displays binary refinements of both these trees, by 
re-introducing the elements of the ST-set tree sequence in reverse order. 



Corollary 1. Let C = Cl{T) be a set of clusters on X , where T — {T±,T2} is a set of two trees 
on X . Let N be a network on X that represents all the clusters in C. Then in polynomial time we 
can construct a network N' that displays binary refinements ofT\ and Ti such that r(N') < r(N). 

Proof. If TV is a tree we can simply take a binary refinement of N and we are done. Otherwise, 
N contains at least one SBR. The taxa in an SBR form an ST-set. So if we identify an SBR of N 
(which can easily be done in polynomial time), remove it (and tidy up in the usual fashion), and 
repeat this until we obtain a tree, we obtain an ST-set tree sequence of length at most r(N). (It 
will be less than r(N) if removing some SBR causes more than one reticulation to disappear from 
the network when tidying up). This dismantling of N is described in more detail in |17l Lemma 
7] . We can then apply Lemma [2] to construct the network. □ 

Lemma [2] and Corollary [l] allow us for the remainder of the article to focus only on clusters. 
4 Terminals 

As we have seen, computing r(C) (and an accompanying optimal network) essentially boils down 
to repeatedly identifying some taxon x such that r(C \ \x\) = r{C) — 1. The key to attaining fixed 
parameter tractability is to construct a "small" X' C X which is guaranteed to contain at least 
one such taxon x. This brings us to the following concept. 

Given a cluster set C and x, y £ X, we write x — >c V if and only if every non-singleton clus- 
ter in C containing x, also contains We say that a taxon x £ X is a terminal if there does not 
exist x' £ X such that x ^ x' and x — >c x' . 

Observation 2. Let C be an ST-collapsed set of clusters on X such that r(C) > 1. Then the 
relation — >c is a partial order on X , the terminals are the maximal elements of the partial order 
and each non-singleton cluster of C contains at least one terminal. 

Proof. The relation — >c is clearly reflexive and transitive. To see that it is anti-symmetric, suppose 
there exist two elements x 7^ y £ X such that x^cD and y^-cx. Then we have that, for every 
non-singleton cluster C £ C, Cn{i, y} is either equal to or {x, y} i.e. C is compatible with {x, y}. 
Furthermore, the only clusters that can possibly be in C\{x, y} are {^},{y} and {x,y} and these 
are all mutually compatible. So {x,y} is an ST-set, contradicting the fact that C is ST-collapsed. 
Hence — >c is a partial order. The fact that the terminals are the maximal elements of the partial 
order then follows immediately from their definition. Finally, observe that a non-singleton cluster 
C must contain at least one terminal, because if it does not then the relation — >c induces a cycle 
on some subset of C, contradicting the aforementioned anti-symmetry property. □ 

Let T be a phylogenetic tree on X. For a vertex u of T we define X(u) C X to be the set of 
all taxa that can be reached from u by directed paths. For a taxon x € X we define W T (x), the 
witness set for x in T, as X{u) \ {x}, where u is the parent of x. A critical property of W T (x) is 
that, for any non-singleton cluster C € Cl(T) that contains x, W T (x) C C [17) . 

Observation 3. Let C = Cl(T) be a set of clusters on X, where T = {Ti,Ta} is a set of two 
trees on X with no non-trivial common pendant subtrees, and r(C) > 1. Then for any x G X the 
following statements are equivalent: (1) x is a terminal of C; (2) there exist incompatible clusters 
d,C 2 €C such that Ci n C 2 = {x}; (3) W Tl (x) n W T2 (a) = 0. 

Proof. We first prove that (2) implies (1). For x' C1UC2 it holds that x-/> c x', because x € C\ but 
x' $ C\. For x' € C\ \ C% it cannot hold that x— >cx' , because x £ Ci but x' C2, and this holds 
symmetrically for x' € Ci \ G\. Hence a; is a terminal. We now show that (1) implies (3). Suppose 
(3) does not hold. Then there exists some taxon x' £ W Tl (x) n W 2 (x). So every non-singleton 
cluster in C that contains x also contains x' , irrespective of whether the cluster came from T\ or 



2 Note that, if a taxon x appears in only one cluster, {x}, then (vacuously) x^cy for all 



T-2,. But then x— >cx', so (1) does not hold. Hence (1) implies (3). Finally, we show that (3) implies 
(2). Note that (3) implies that in both T\ and T 2 the parent of x is not the root. If this was not 
so, then (wlog) W Tl (x) = X \ {x}, and combining this with the fact that W Tl (x), W T<1 (x) ^ 
would contradict (3). Hence W Tl (x) U {x} G Cl(Ti) and W T2 {x) U {x} e Cl(T 2 ), from which (2) 
follows. □ 

For two nodes u ^ v in a network we define a tree path from u to v as a directed path that 
starts at u and ends at v such that all interior nodes of the path are tree nodes. This definition 
includes the possibility that u and/or v are reticulation nodes, this will be clear from the specific 
context. Observe that if x ^ y are taxa in a network N that represents a set of clusters C and 
there is a tree path from the parent of x to y, then x^cV- The set of nodes reachable by a tree 
path from u is the set of all v ^ u such that there is a tree path from u to v. 

Lemma 3. Let C be an ST-collapsed set of clusters on X such that r(C) > 1. Then C has at most 
3 • r(C) terminals. 

Proof. Let N be a network on X such that N represents C and r(N) = r(C). Without loss of 
generality we can assume N is binary. For each x € X, exactly one of the following conditions 
holds: (1) the parent of x in N is a reticulation; (2) the parent of x in N is not a reticulation 
but there is a directed path from the parent of a; in TV to a reticulation. To see this observe 
that if neither condition holds then N contains an edge (u, v) such that at least two taxa, but 
no reticulations, are reachable by directed paths from v. But then C contains a non-singleton 
ST-set, contradiction. Let R(N) be the reticulation nodes in N. Let fl{C) C X denote the set of 
terminals of C. We describe a function F : J?(C) — > R(N) such that each reticulation is mapped to 
at most 3 times, from which the result follows. For each terminal x for which condition (1) holds, 
F(x) = p(x), where p(x) is the parent of x. For each terminal x for which condition (2) holds, 
choose a reticulation r such that there is a tree path from p(x) to r, and set F(x) — r. Note that 
there cannot ever be a tree path from p(x) to y if x ^ y arc both terminals, because this would 
mean x^cV- Now, it follows that a reticulation can be mapped to (in F) in at most 3 ways: from 
a terminal immediately below it and from one terminal per incoming edge. □ 

Corollary 2. Let C be an ST-collapsed set of clusters on X such that r(C) > 1. Any subset of 
terminals with cardinality 2-r(C) + l or higher, contains at least one taxon x such that r(C\{x}) < 
r(C). 

Proof. From the proof of Lemma [3] we observe that in any subset of 2 ■ r(C) + 1 terminals, there 
exists at least one taxon x for which condition (1) holds. Hence x is an SBR and (as argued in 
Observation [l} r{C \ {x}) < r(C). □ 

5 Main result 

For a reticulation r in a network N, let X l (r) be the set of all taxa that can be reached by tree 
paths from r. For example, if we label the reticulations in the network in Figure [2] n, ^2^3, from 
left to right, X l (ri) = {6}, X t (r2) = {c} and X t (rs) = {e}. The following lemma shows that an 
optimal network cannot contain a reticulation r such that ^(r) = 0. 

Lemma 4. Let C = Cl{T) be a set of clusters on X, where T = {7i,T2} is a set of two trees 
on X with no non-trivial common pendant subtrees, and r(C) > 1. Let N be a network on X that 
represents C and let r be a reticulation of N such that X l (r) = 0. Then r(C) < r(N). 

Proof. Let i?*(r) be the set of reticulations in N reachable by tree paths from r. Now, consider 
the technique described in the proof of Corollary [l] for dismantling N by removing one SBR at a 
time. All reticulations in i?* (r) will be pruned away at an iteration that is earlier than or equal to 
the iteration in which r is pruned away. Moreover, due to the fact that X t {r) = - that is, there 
are no taxa "sandwiched" between r and R l (r) - there definitely exists r' E R* (r) such that r' and 



r both vanish in the same iteration. But this means that the technique produces an ST-set tree 
sequence of length strictly less than r(N), which (by Lemma [2j or [T71 Theorem 3]) implies the 
existence of a network N' that represents C such that r(N') < r(N). □ 

Corollary 3. Let C = Cl(T) be a set of clusters on X , where T — {Tx, T-z\ is a set of two trees 
on X with no non-trivial common pendant subtrees, and r(C) > 1. Let N be a network on X that 
represents C such that r(N) = r(C) and let r be a reticulation of N such that X (r) — {x} for 
some x £ X. Then r(C \ {x}) = r{C) — 1. 

Proof. If x is an SBR the result is immediate. Otherwise, if x is deleted from TV, then a network N' 
is obtained such that N' represents C\{x} and, in TV', X t (r) — 0. By Lemma[i|r(C\{s}) < r(N'). 
The result follows because r(N') = r(N) = r(C). □ 

For a network TV, we say that a switching of N is obtained by, for each reticulation node, 
deleting all but one of its incoming edges. The red subtrees in Figure [2] are switchings. A network 
N on X displays a tree T on X if and only if there is a switching T/v of N such that T can be 
obtained from T/v by suppressing nodes with indegree and outdegree equal to one (and if necessary 
deleting nodes with indegree and outdegree 1). Hence, each switching is the "image" in N of some 
tree displayed by N. Indeed, the following definitions are entirely consistent with the definition 
of cluster representation given in Section [2] Given a network N and a switching T/v of N, we say 
that an edge (u, v) of N represents a cluster C w.r.t. T/v if (u, v) is an edge of T/v and C is the set 
of taxa descendants of v in T/v. It is natural to define that an edge (u, v) of N represents a cluster 
C if there exists some switching TV of N such that (it, v) represents C w.r.t TV. 

We say that a cluster C € C is minimal if it is a non-singleton cluster such that there does not 
exist a non-singleton cluster C € C with C C C. 

Lemma 5. Let C = Cl(T) be a set of clusters on X, where T = {Ti,?^} * s a se ^ °f ^ wo ^ re es 
on X with no non-trivial common pendant subtrees, and r(C) > 1. There exists a minimal cluster 
C G C such that, for at least \C\ — 1 of the taxa x in C , r(C \ {x}) = r(C) — 1. 

Proof. Let TV be a binary network that represents C such that r(C) = r(N). Let e = (u, v) be an 
edge of N that represents some non-singleton cluster of C such that there does not exist another 
edge e' = (u',v') reachable from e with this property (where reachable here means: there is a 
directed path from v to u'). Hence e is a "lowest" edge that represents a non-singleton cluster. 
Let C G C be a non-singleton cluster represented by e. We will prove that at least |C| — 1 taxa x 
in C have the property r(C \ {x}) — r(C) — 1. Observe that this property will then automatically 
also hold for all non-singleton clusters C C C, in particular minimal C , from which the claim 
will follow. 

By definition e = (u, v) is an edge of some switching Tn of N such that C is equal to the set of 
taxa descendants of v in T/y. Fix any such Tpj. Observe firstly that if there is a directed path in Tjv 
from v to some reticulation r, then X t {r) C C. The next statement is critical. Suppose there is a 
tree node v' which is reachable in Tjv by a directed path from v. Suppose furthermore that, in TV, 
the set of all taxa X' reachable from v' by tree paths (in TV) has cardinality exactly 2. We show 
that this situation cannot actually happen. To see this, let {y, z} be the taxa in X' . By assumption 
{y, z} is not an ST-set, because C is ST-collapsed. Hence there must exist a non-singleton cluster 
C* G C such that without loss of generality C* n {y, z} — {y}. Now, C* must be represented by 
some edge e' = (u',v') of N. Moreover, e' must lie somewhere on the tree path from v' to y in 
Tjy. However, v! is then reachable by a directed path from v, contradicting our claim that e was 
"lowest". So such an X' does not exist. Now, suppose that r is a reticulation in T/v such that (1) r 
can be reached in T/v by a directed path from v, (2) two or more taxa can be reached in T/v from 
r by tree paths. Due to the fact that N is binary, there must exist a tree node v' reachable in T/v 
by a tree path from r, such that {x, y} are the only two taxa reachable from v' by tree paths in 
T/v- We have already concluded, however, that this is not possible. Hence we can infer that, if r 
is a reticulation in TV such that r can be reached by a directed path from v, |#*(r)| = 1. This, 



in turn, means that with one possible exception (because there can be at most one taxon in C 
reachable in Tjv from v by a tree path) each taxon x <E C is such that Af*(r) = {x} for some r i.e. x 
is either an SBR or is the unique taxon "sandwiched" between several reticulations. By Corollary 
Owe are done. □ 

An immediate consequence of Lemma [5] is that if we could identify the minimal cluster C, it 
would be sufficient to restrict our attention to an arbitrary size-2 subset of it: we could still be 
sure that at least one of the the taxa x is such that r(C \ {x}) = r(C) — 1. This is the motivation 
behind the following theorem. 

Theorem 1. Let C = Cl(T) be a set of clusters on X , where T = {Ti,T 2 } is a set of two trees on 
X with no non-trivial common pendant subtrees, and r(C) > 1. Let X' C X be the set constructed 
as follows. If there are strictly more than 2 • r(C) terminals in C, let X' be an arbitrary subset 
of the terminals of cardinality 2 ■ r(C) + 1. Otherwise, for each minimal cluster C € C, put two 
arbitrary taxa from C in X' , of which at least one is a terminal. Then \X'\ < 6 • r(C) and there 
exists x € X' such that r(C \ {x}) — r(C) — 1. 

Proof. The first way of constructing X' is correct by Corollary [2] Let us then assume that there 
are at most 2 • r(C) terminals. Recall that each (minimal) cluster contains at least one terminal, 
by Observation [2j A terminal can appear in at most one minimal cluster from Ti, and at most one 
minimal cluster from T 2 . Consider the following mapping from X' to itself. Map each terminal to 
itself. For each non-terminal y G X', map y (arbitrarily) to a terminal x G X' such that x and y 
are both in some minimal cluster of C. In this mapping, a terminal can be mapped onto at most 
3 times (i.e. from itself and at most two non-terminals). Hence \X'\ < 6 • r(C). □ 

6 The algorithm 

We describe the algorithm non-deterministically to keep the exposition as clear as possible. 
Input: Two trees T = {Ti, T 2 } on the same set of taxa X. 

Output: A network N that displays binary refinements of Ti and T 2 such that r(N) = h{T). 



Algorithm 1 

1: set C ~ Cl(T) 

2: guess r = h(T) = r(C) 

3: for i := r downto 1 do 

4: collapse all maximal ST-sets (i.e. maximal common pendant subtrees) in C to obtain a set of clusters 

c 

5: if C' contains more than 2i terminals then 

6: set X' to be an arbitrary size 2i + 1 subset of the terminals 

7: else 

8: construct X' by taking two taxa from each minimal cluster of C' , such that at least one of each 

pair is a terminal 
9: end if 

10: guess an element x G X' such that r(C' \ {x}) = r(C') — 1 and record that x r -i+i := x 
11: set C := C \ {x} 
12: end for 

13: convert the sequence (asi, . . . , x r ) into the ST-set tree sequence 5 = (Si, . . . , S r ) of C by decollapsing 
taxa 

14: use 5 to construct a binary network iV with r(N) = h(T) that displays binary refinements of Ti and 
T2 (see Lemma 



The correctness of the algorithm is primarily a consequence of Lemma [5] and Corollary [2] If we let 



r = r(C), the running time is at most (6 r r!) • r • poly{n) where n — \X\. The single r term comes 
from line 2. The (6 r r!) term is a consequence of Theorem [IJ \X'\ never rises above 6r, and each 
iteration of the main loop is assumed to reduce the reticulation number by 1, giving a running 
time of at most (6r)(6(r — l))(6(r — 2)) . . . = 6 r rl. The poly(n) term includes operations such 
as computing terminals, locating minimal clusters and collapsing maximal ST-sets; the first two 
operations are clearly polynomial-time because C(T) < 4(n— 1) (which follows from the fact that 
a tree on n taxa contains at most 2(n — 1) edges). In fact, the most time-consuming operation 
inside the poly(n) term is collapsing maximal ST-sets (i.e. maximal common pendant subtrees). 
In |17[ Lemma 5] a naive 0(n 4 ) algorithm is given for this although with intelligent use of data 
structures and exploiting the fact that C comes from two trees 0(n 2 ) is certainly possible without 
too much effort. Finally, we note that the single r term can be absorbed, if necessary, into the 
poly(n) term to give (6 r r!) • poly(n), because (trivially) r < n. 
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